Metrics for Inference Deployment

In 2012, AlexNet was the first DNN architecture to win the ImageNet classification challenge, with a top5 error rate of 15.4%, resoundingly beating the next best entry, which had a 26.2% error rate. Ever more complex and accurate DNN’s have been developed on the ImageNet benchmark dataset ever since, including VGGNet, ResNet, Inception, GoogleNet, and their many variations. The increased accuracy is the result of breakthroughs in design and optimization, but comes at a cost when computation resources are considered.

https://cs.stanford.edu/people/karpathy/cnnembed/ and is noted "Feel free to use any of the images/code anywhere. Ping me at @karpathy for questions.">

Analysis

An analysis (Canziani et al, 2016) of state-of-the-art DNN’s, using additional computation metrics, provides insight into design constraints in deployable systems that use DNN’s for inference. Fourteen top architectures were trained on ImageNet, deployed on a Jetson TX1, and compared across the following metrics:

Top1 Accuracy
Operations Count
Network Parameters Count
Inference Time
Power Consumption
Memory Usage

The following table provides a sampling of the results (values are approximated from graphs in the paper), including a derived metric called information density. The information density is a measure of the efficiency of the network, or how much accuracy is provided for every one million parameters that the network requires.

Note that only the results based on a batch size of one are included. In most cases, the batch size provides a speedup in inference time but maintains the same relative performance among architectures. However, an exception is AlexNet, which sees a 3x speedup when going from 1 to 64 images per batch due to weak optimization of its fully connected layer. See the paper for a much more detailed summary!

Sampling of metrics for architectures running ImageNet inference on a Jetson TX1

Conclusions

The Canziani analysis paper concludes with some key insights that are useful when optimizing a deployable robotic system using inference:

Power consumption is independent of batch size and architecture
- “When full resources utilisation is reached, generally with larger batch sizes, all networks consume roughly an additional 11.8W”
Accuracy and inference time are in a hyperbolic relationship
- “a little increment in accuracy costs a lot of computational time”
Energy constraint is an upper bound on the maximum achievable accuracy and model complexity
- “if energy consumption is one of our concerns, for example for battery-powered devices, one can simply choose the slowest architecture which satisfies the application minimum requirements”
The number of operations is a reliable estimate of the inference time.

Metrics Quiz

SOLUTION:

VGG has an outsized number of parameters compared to the other architectures which directly affects the efficiency calculation

Resources

Canziani, Alfredo, Adam Paszke, and Eugenio Culurciello. "An analysis of deep neural network models for practical applications." arXiv preprint arXiv:1605.07678 (2016).
https://arxiv.org/pdf/1605.07678.pdf
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Adit Deshpande. "The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3)." Adeshpande3.github.io. 17 Dec. 2017. Web. 12 Jan. 2018. https://adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
Dave Gershgorn. "The data that transformed AI research—and possibly the world." Quartz. n.d. Web. 14 Jan. 2018. https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/